Intro to NGS processing
James A. Fellows Yates
2021-08-17
Who am I?
- Education
- B.Sc. Bioarchaeology (University of York, UK)
- M.Sc. Naturwissenschaftliches Archäologie (University of Tübingen, DE)
- Ph.D. Archaeogenetics (MPI-SHH / MPI-EVA, DE)
- Experience
- Number of genetics classes taken: 0
- Number of bioinformatics classes taken: 0
@jfy133
Today we will
- Introduce what DNA sequencing is
- Explain how Illumina NGS sequencing data is generated
- How to evaluating NGS data [Practical]
What is DNA?
Deoxyribonucleic acid (/diːˈɒksɪˌraɪboʊnjuːˌkliːɪk, -ˌkleɪ-/ (DNA) is a molecule composed of two polynucleotide chains that coil around each other to form a double helix carrying genetic instructions for the development, functioning, growth and reproduction of all known organisms and many viruses. - Wikipedia
What is DNA?

What is DNA?

The rules
- Four nucleotides
- Pyrimidines:
Cytosine, Thymine
- Purines:
Guanine Adenine &
- Base pairing: one pyrimidine with one purine
C with G (think: CGI)
A with T (think: AT-AT walker)
- Complementary
C on one strand, G on the other (or v.v.)
A on one strand, T on the other (or v.v.)

The rules
- Make copy of a DNA strand with a polymerase
- Unwind the DNA
- Separate the strands
- Make new strand: find a
C, get new G (etc)
How do we get DNA?

Introduction to DNA Sequencing
What is Sequencing?
Converting the chemical nucleotides of a DNA molecule
to
ACTG on your computer screen
Historically

- Separate strands, add primer (starting point)
- Add mix of nucleotides, some with special ‘terminators’
- Pass through size-filtering, read order of terminators
Pros and cons of Sanger Sequencing
- Pros
- More precise (less errors)
- Longer reads
- Cons
- Resource heavy: lot of input DNA
- Slow: one. fragment. at. a. time.
What is NGS?
- NGS: Next Generation Sequencing
- MASSIVELY multiplexed!
- Sequence millions and even billions of DNA reads at once!
Not really ‘next’ anymore, consider it more ‘second’ generation (see: Nanopore)
What is NGS?
Market leader: 

(Others: Roche 454, PacBio, IonTorrent etc.)
How does it work?
- Basically same concept, but:
- no size separation
- with pretty pictures!
i.e. attach fluorophore-modified nucleotides, (normally) one colour per base
A
G
T
C
Fire mah lazer, and take a picture! Rinse and repeat!
Where does this happen?
On a ‘flow cell’

Where does this happen?
But how do you get your DNA to attach to the lawn
(and not get lost)?
- Convert it to library:
- Add adapters: bind to the ‘lawn’ of the flow cell
- Add indexes: sample-specific barcode
- Add priming sites: where enzymes start copying DNA
AATGATACGGCGACCACCACaccgacaaCCCTACACGACGCTCTTCCGATCTXXXXXXAGCACACGTCTGAACTCCAGTCACgacactaCCGTCTTCTGCTTG ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| TTACTATGCCGCTGGTGGTGtggctgttGGGATGTGCTGCGAGAAGGCTAGAXXXXXXTCGTGTGCAGACTTGAGGTCAGTGctgtgatGGCAGAAGACGAAC
[Adapter & Index Primer] [Index] [Target primer] [Target] [Target primer] [Index] [Adapter & Index Primer]
Sequencing-by-synthesis
Add DNA to flow cell, but problem: florescence of one single nucleotide not enough…

Make lots of copies!
Sequencing-by-synthesis

- Add florescent nucleotides (complementary will bind)
- Fire laser & take photo
- Wash away unbound nucleotides
- Remove fluorophore
- Back to 1 ⤴️
What does this look like?

Improving quality
One problem, over time, imaging
Throughout limits
Paired end
Paired end sequencing
Once end, bendover, attach other end (turnaround) and start from the end of the molecule
Cons of NGS sequencing
- less accurate (laser/photo can get wrong)
- chemistry limits (DNA strands gets old through heat cycling for denautring; dirty environment from suboptiomal wash steps etc.) mean short reads (compensated by volume)